UMass at TDT 2004
نویسندگان
چکیده
Topic Detection classifies stories into different topics, but HTD requires more than that. Is there any other entities between a story and a topic? [10] views a topic as a structure of inter-related events, which gives us a good hint for this new task. Experiments in [10] show that time locality is a very useful attribute in event organization, and it can also help to solve the complexity problem in TDT2004. The TDT-5 collection contains 407,503 stories in three different languages, and the running time for traditional clustering algorithms, which take , is not acceptable for such a huge collection. Since we know that stories in the same event tend to be close in time, we only need to compare a story to its “local” stories instead of the whole collection. The algorithm we use has two steps, bounded 1-NN for event formation and bounded agglomerative clustering for building the hierarchy. In the first step, all stories in the same original language and from the same source are taken out and time ordered. Stories are processed one by one and each incoming story is compared to a certain number of stories before it. This number is approximately the number of stories in a token file and the value is 100 for the baseline run. If the similarity (cosine similarity of tf-idf term vectors) of the current story and the most similar previous story is larger than a given threshold (0.3 in the baseline run), the current story will be assigned to the event that the most similar previous
منابع مشابه
UMass at TDT 2000
We had two thrusts to our research, neither of which was ready to be deployed in this evaluation. We report here on the results from the training data, in all cases explored within the link detection task. In the first direction, we looked more carefully at score normalization across different languages and media types. We found that we could improve results noticeably though not substantially ...
متن کاملDistinct requirements for Ku in N nucleotide addition at V(D)J- and non-V(D)J-generated double-strand breaks.
Loss or addition of nucleotides at junctions generated by V(D)J recombination significantly expands the antigen-receptor repertoire. Addition of nontemplated (N) nucleotides is carried out by terminal deoxynucleotidyl transferase (TdT), whose only known physiological role is to create diversity at V(D)J junctions during lymphocyte development. Although purified TdT can act at free DNA ends, its...
متن کاملTdt-2004: Adaptive Topic Tracking at Maryland
A topic tracking system that combines elements from vector space and language modeling frameworks to compute document scores is described. The model is used for both the traditional TDT topic tracking evaluation design and the new supervised adaptive topic tracking evaluation. Results indicate that supervised adaptation and score normalization should be more closely coupled, and that current te...
متن کاملMutational analysis of terminal deoxynucleotidyltransferase-mediated N-nucleotide addition in V(D)J recombination.
The addition of nontemplated (N) nucleotides to coding ends in V(D)J recombination is the result of the action of a unique DNA polymerase, TdT. Although N-nucleotide addition by TdT plays a critical role in the generation of a diverse repertoire of Ag receptor genes, the mechanism by which TdT acts remains unclear. We conducted a structure-function analysis of the murine TdT protein to determin...
متن کاملResults of the 2003 Topic Detection and Tracking Evaluation
The National Institute of Standards and Technology (NIST) administered the sixth open evaluation of Topic Detection and Tracking (TDT) technologies in November of 2003. The TDT project supports development of technologies that automatically organize eventrelated news stories. The program leverages expertise in core technologies, Automatic Speech Recognition (ASR), Document Retrieval (DR), and M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004